Classification of protein 3D folds by hidden Markov learning on sequences of structural alphabets
نویسندگان
چکیده
Fragment-based analysis of protein three-dimensional (3D) structures has received increased attention in recent years. Here, we used a set of pentamer local structure alphabets (LSAs) recently derived in our laboratory to represent protein structures, i.e. we transformed the 3D structures into one-dimensional (1D) sequences of LSAs. We then applied Hidden Markov Model training to these LSA sequences to assess their ability to capture features characteristic of 43 populated protein folds. In the size range of LSAs examined (5 to 41 alphabets), the performance was optimal using 20 alphabets, giving an accuracy of fold classification of 82% in a 5-fold cross-validation on training-set structures sharing < 40% pairwise sequence identity at the amino acid level. For test-set structures, the accuracy was as high as for the training set, but fell to 65% for those sharing no more than 25% amino acid sequence identity with the training-set structures. These results suggest that sufficient 3D information can be retained during the drastic 3D->1D transformation for use as a framework for developing efficient and useful structural bioinformatics tools.
منابع مشابه
A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملTowards Discovering Structural Signatures of Protein Folds Based on Logical Hidden Markov Models
With the growing number of determined protein structures and the availability of classification schemes, it becomes increasingly important to develop computer methods that automatically extract structural signatures for classes of proteins. In this paper, we introduce and apply a new Machine Learning technique, Logical Hidden Markov Models (LOHMMs), to the task of finding structural signatures ...
متن کاملComparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice
A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...
متن کاملDiscretization of 3D protein conformations by learning fragments library and their short range dependence using a Hidden Markov Model
The aim of this study is to discretize protein three-dimensional (3D) conformation with an optimal accuracy. In a previous paper, overlapping 4-peptide fragments describing (3D) conformations of proteins were systematically classified by a Hidden Markov Model (HMM) [2]. Using HMM allows moving the description of 3D structures from the sole geometric aspect towards a more explanatory description...
متن کاملA Two-Layer Learning Architecture for Multi-Class Protein Folds Classification
The successful completion of many genome sequencing projects produces a massive number of putative protein sequences. However, the number of known three-dimensional (3D) protein structure is growing at a much slower pace. This situation has challenged us to develop computational methods by which the 3D protein structure could be predicted timely from its sequence. Many computational methods hav...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005